Modified Kneser-Ney Smoothing of n-gram Models

نویسنده

  • Frankie James
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sub-word Based Language Modeling for Amharic

This paper presents sub-word based language models for Amharic, a morphologically rich and under-resourced language. The language models have been developed (using an open source language modeling toolkit SRILM) with different n-gram order (2 to 5) and smoothing techniques. Among the developed models, the best performing one is a 5gram model with modified Kneser-Ney smoothing and with interpola...

متن کامل

Improved Smoothing for N-gram Language Models Based on Ordinary Counts

Kneser-Ney (1995) smoothing and its variants are generally recognized as having the best perplexity of any known method for estimating N-gram language models. Kneser-Ney smoothing, however, requires nonstandard N-gram counts for the lowerorder models used to smooth the highestorder model. For some applications, this makes Kneser-Ney smoothing inappropriate or inconvenient. In this paper, we int...

متن کامل

Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models

The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for...

متن کامل

A Bayesian Interpretation of Interpolated Kneser-Ney NUS School of Computing Technical Report TRA2/06

Interpolated Kneser-Ney is one of the best smoothing methods for n-gram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated Kneser-Ney as approximate inference in a hierarchical Bayesian model consisting of Pitman-Yor processes. As opposed to p...

متن کامل

A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999